† Corresponding author. E-mail:
Ribonucleic acids (RNAs) play a vital role in biology, and knowledge of their three-dimensional (3D) structure is required to understand their biological functions. Recently structural prediction methods have been developed to address this issue, but a series of RNA 3D structures are generally predicted by most existing methods. Therefore, the evaluation of the predicted structures is generally indispensable. Although several methods have been proposed to assess RNA 3D structures, the existing methods are not precise enough. In this work, a new all-atom knowledge-based potential is developed for more accurately evaluating RNA 3D structures. The potential not only includes local and nonlocal interactions but also fully considers the specificity of each RNA by introducing a retraining mechanism. Based on extensive test sets generated from independent methods, the proposed potential correctly distinguished the native state and ranked near-native conformations to effectively select the best. Furthermore, the proposed potential precisely captured RNA structural features such as base-stacking and base-pairing. Comparisons with existing potential methods show that the proposed potential is very reliable and accurate in RNA 3D structure evaluation.
In living systems, RNAs perform essential roles including transmission of genetic information, regulation of gene expression, and catalysis of biochemical reactions.[1–3] Understanding and utilizing these functions require comprehensive knowledge of RNA 3D structures. However, obtaining high-resolution RNA 3D structures through experimental methods such as x-ray crystallography or NMR is a challenging task.[4] Some alternative computational methods for RNA 3D structure prediction have been developed.[5–20] However, for a given RNA sequence, the existing methods usually generate a set of possible near-native conformations rather a best one.[5,7,8,14,17,21] For example, the recent coarse-grained model presented by Tan’s group can predict a serial of 3D structures (or decoys) for a 25-nt RNA hairpin with a mean RMSD of 2.5 Å in spite of the minimum 1.0-Å RMSD.[13] Therefore, the selection of the best structures from an ensemble of conformations is a vital but challenging task.[5,22] Meanwhile, it has occurred in the evaluation of protein tertiary structure[23–25] and protein–RNA/DNA complex.[26,27]
In recent years, several potentials have been proposed for RNA 3D structure evaluation; e.g., the nucleic acid simulation tool (NAST),[28] the Rosetta,[8,29] the ribonucleic acids statistical potential (RASP),[30,31] the RNA coarse-grained and all-atom knowledge-based (RNA KB) potentials,[32] and 3dRNAscore.[33] The NAST can generate, cluster, and rank RNA tertiary structures using an RNA-specific knowledge-based potential in a coarse-grained molecular dynamics (MD) engine; however, it needs a secondary structure, tertiary contact information, and some experimental data. Instead, Rosetta, also known as FARNA/FARFAR, is successful in predicting and evaluating RNA 3D structures just from sequences, but only applies to small RNAs. RASP is another full-atom potential that explicitly embraces base-pairing and base-stacking interactions and is able to discriminate conformations between near-native and misfolded RNA conformations including even several non-canonical base pairs. However, this potential is derived from only 85 RNA structures and the precision could be further improved by involving more diverse RNA structures.[34] RNA KB potential, which is derived from the diverse RNAs of six different families, effectively assesses types of RNA conformations and can be used as a force field in molecular dynamics. Nonetheless, the above potentials only include interactions between two atoms in different nucleotides and could ignore the fluctuations of local structures.[35,36] Very recently, a novel full-atom statistical potential 3dRNAscore developed by Xiao’s group performed better than the above potentials in identifying RNA native structures from a pool of decoys by combing the distance-dependent energies with a new dihedral-dependent energy. However, the 3dRNA score as well as the other existing potentials consider only the universality of the limited experimental structures in a training set[37] and ignore the distinctions among the isolation of individuals, which can be significant in RNA folding.[38,39]
Here, we propose a new all-atom knowledge-based potential to accurately assess RNA tertiary structures. The proposed potential has the following features: (i) it is derived from a set of 380 non-redundant RNA structures that embraces most of the common RNAs and motifs, (ii) it considers the local interactions between two atoms within one nucleotide by adding a new energy term that efficiently represents the local geometrical features of RNA structures and depicts the flexibility of RNAs,[40,41] and (iii) introduces a retraining process for each RNA to identify the individual variation of different RNAs where the potential can be optimized based on several low-energy conformations from initial scoring. Moreover, the proposed potential is validated by different test sets widely used to analyze the performance of knowledge-based potential in recent works. Our results show that the proposed potential effectively evaluates RNA 3D structures and selects the native structure from a pool of RNA conformations, ranking near-native structures to identify the best ones, and capturing base-base interactions. Furthermore, the proposed potential is compared to existing methods such as RASP, RNA KB, 3dRNAscore, and Rosetta to show its feasibility in the assessment of RNA 3D structures.
The thermodynamic hypothesis proposed by Anfinsen is that the native state tends to have the lowest free energy.[42] To evaluate an RNA 3D structure, the proposed potential considers two energy terms. The total energy utotal of a structure is given by
The energies between the two atoms in Eq. (
The parameters of the proposed potential were calculated based on the statistical analysis of known RNA 3D structures in the PDB/NDB database. First, we gathered 1369 non-redundant structures including most of the common RNAs (e.g., mRNA, rRNA, tRNA, ribozyme, and riboswitch) and various RNA complexes where only RNAs were taken into account. Second, we discarded the structures with sequence identities greater than 80% using the BLASTN program with default options[44] and removed low-quality structures with a resolution
In structure ranking, the initial potential obtained above can be further optimized for decoys of an RNA by a retraining mechanism (Fig. S5). In the retraining process of each RNA, based on the energies calculated by the initial potential, structures with the lowest energies are used as a new training set to further obtain an RNA-specific potential, and energies of all conformations can be calculated again by the retrained potential (Fig. S5(a)). Note that the number of conformations with low energies is generally taken as 10, where the potential has the highest precision (Fig. S5(b)).
To test the performance of the proposed potential, we used two different decoy sets widely used in recent works to analyze the performance of knowledge-based potential.[8,29–33] Test set I, which has been used by RASP and 3dRNAscore, consisted of decoys generated from 85 native structures (only with normal heavy atoms) by MODELLER[45] with a set of Gaussian restraints for dihedral angles and bond stretches and can be downloaded from
Generally, the enrichment score (ES) is often used to describe the performance of a scoring function in identifying the best RNA structures.[32,49] The ES is defined as
Owing to the different sample sizes for the two potentials in Eq. (
The proposed knowledge-based potential can be used to select the native state and rank near-native structures to identify the best ones to capture RNA structural features. Compared with existing potentials, the proposed potential performs much better in RNA 3D structures assessment.
To test the performance of the proposed potential, we first use our potential to select the native structure from decoys based on the thermodynamic hypothesis.[42] For decoys in Test sets I and II and the corresponding native structures, we use the proposed potential with initial parameters to calculate the energy for each conformation and select the conformation with the lowest energy as a native structure. As shown in Fig.
We also compared the proposed potential against well-established existing potentials such as RASP, KB, and Rosetta in identifying the native structures from different decoys. We employed the RASP, KB, and Rosetta potentials with corresponding programs or software to select native structures for the two test sets, and the results are shown in Fig.
It is noted that several native structures, such as 1ESY, 1KKA, and 1QWA in the test sets, could not be correctly selected by either the proposed potential or the other existing potentials as shown in Fig.
The identification of the best structures is a vital task in RNA tertiary structure prediction.[8,13] For example, although a recent new method using evolutionary restraints of RNAs was proposed to predict RNA 3D structures, it still needs to recognize the best conformations from the predicted structures using a knowledge-based potential such as 3dRNAscore.[21] However, it is still challenging and essential for the existing predictive methods to accurately recognize the best conformations from a series of predicted conformations. Thus, the precise identification of the best RNA conformations from a set of near-native structures is an essential purpose for the scoring function. To assess the quality of our potential, we employed the proposed potential to discriminate and rank the near-native conformations of RNA.
For each RNA in Test set II, we first calculated the energies of all decoys excluding the native structure by using the proposed potential[30,33] and ranked the decoys based on energy (Fig. S5). Then, selecting 10 decoys with the lowest energies as a new training set, the potential was trained again to consider the characteristics of the RNA[37,39] This retraining process step proved to be effective for improving the accuracy of RNA structure evaluation. Finally, we further ranked the decoys of the RNA again based on the energies scored by the retrained potential (Figs. S6 and S7) and assumed that the lower the energy, the better the structure.
As shown in Fig.
In addition, we benchmarked the performance of 3dRNAscore, RASP, KB, Rosetta, and the proposed potential in selecting the best structures on Test set II using both RMSD-based ES and DI-based ES. As shown in Fig.
Figure
Generally, base-pairing and base-stacking interactions provide a strong force in stabilizing RNA 3D structures.[17,54] Thus, the capacity to precisely capture base-pairing/stacking interactions is one of the significant criteria for determining the quality of potentials. We further analyzed the energies between the specific atom pairs involved in base-pairing and base-stacking such as N1 of adenine and N3 of uracil in the proposed potential (Fig.
Based on the carefully selected training set shown in Figs.
To clarify the contributions of the two improvements, we further performed two additional tests for each RNA decoy in Test set II using our potential without the uintra or the retraining mechanism (Table sup). As shown in Fig.
However, for some RNAs such as 1DQF, 1I9X, 1J6S, 1KD5, 1KKA, and 1MHK in the FARNA decoy set, the employment of the retraining process can result in a decrease of ES (Table S4). The possible reason is that the retraining process is based on the results generated from the initial scoring; however, for these RNAs with complex tertiary interactions (e.g., non-native hydrogen bonds), the initial scoring obtained by the proposed potential as well as the other existing potentials still needs improvement.
Although the knowledge-based statistical potential has proven to be an efficient method for structural evaluation, the selection of an effective reference state is still an inherent limitation for the knowledge-based scoring function. Selecting diverse reference states to score the same decoys can lead to different results.[58,59] Since an ideal reference state is not achievable, current statistical potentials generally construct reference states through randomizing disconnected atoms; e.g., the averaged reference state in 3dRNAscore and RASP and the quasi-chemical approximation reference state in KB, which ignore the diversity of various RNAs.[60] Therefore, on the basis of choosing the averaged reference state, we further propose a retraining mechanism in the proposed potential to distinguish the characteristic of different RNAs, which has proved to be very effective in assessing RNA 3D structures (Fig.
In this work, we developed a novel all-atom knowledge-based potential to evaluate and analyze RNA 3D structures. First, the proposed potential correctly selected native structures from different poor decoys by combining the energies between two atoms within one nucleotide into the conventional statistical potential. Second, the proposed potential was effective in ranking near-native structures for various decoys and then selecting the best ones based on introducing a retraining mechanism to distinguish the characteristics of different RNAs. Third, the benchmark test with the existing potentials on extensive test sets showed that the proposed potential performed better than the others in RNA 3D structure evaluation. Finally, the proposed potential effectively and precisely captured the features of RNA structure, such as Watson–Crick base-pairing and base-stacking.
Despite this success, the proposed potential has some limitations. For example, the averaged reference state used by the proposed potential and other potentials has its limitations.[58] Although our proposed retraining process overcame the insufficiency to some degree, we believe that a unified potential should be proposed by including more detailed energies to solve the problem.[61–63] Furthermore, since RNAs are strongly charged polyanionic chains, there is a strong intrachain Coulombic repulsion during RNA folding.[54] Therefore, RNA 3D structures can be highly sensitive to ion conditions.[64–68] However, the existing knowledge-based potentials including the proposed potential only implicitly consider this effect by counting the experimental structures and cannot be used to assess RNA structures in different ion conditions. Moreover, although the tests in this work show the reliable availability of our potential, more comprehensive tests on various different decoy sets, such as RNA conformations from RNA-Puzzles,[5] will be performed in the future, and then available software of the proposed potential will be further developed.
[1] | |
[2] | |
[3] | |
[4] | |
[5] | |
[6] | |
[7] | |
[8] | |
[9] | |
[10] | |
[11] | |
[12] | |
[13] | |
[14] | |
[15] | |
[16] | |
[17] | |
[18] | |
[19] | |
[20] | |
[21] | |
[22] | |
[23] | |
[24] | |
[25] | |
[26] | |
[27] | |
[28] | |
[29] | |
[30] | |
[31] | |
[32] | |
[33] | |
[34] | |
[35] | |
[36] | |
[37] | |
[38] | |
[39] | |
[40] | |
[41] | |
[42] | |
[43] | |
[44] | |
[45] | |
[46] | |
[47] | |
[48] | |
[49] | |
[50] | |
[51] | |
[52] | |
[53] | |
[54] | |
[55] | |
[56] | |
[57] | |
[58] | |
[59] | |
[60] | |
[61] | |
[62] | |
[63] | |
[64] | |
[65] | |
[66] | |
[67] | |
[68] |